The Long-term Effects of Universal Preschool in Boston
Guthrie Gray-Lobe, Parag Pathak, and Christopher Walters

Readme file for replication of results

Prepared by Hellary Zhang, completed August 30, 2022
Full replication certified by Jim Shen on August 30, 2022

Contents:
1. Data
2. Replication Instructions
3. Output


********************************************* 1. Data *********************************************

* Data notice:
This study uses confidential student-level data from Boston Public Schools (BPS) and the Massachusetts
Department of Elementary and Secondary Education (MA DESE). As of August 30, 2022, the links to initiate
research data requests with these parties are the following:
- BPS: https://www.bostonpublicschools.org/Page/7833
- MA DESE: https://www.doe.mass.edu/research/researchers.html

This Readme provides replication instructions using data obtained from BPS, MA DESE and other sources:

* Files from BPS:
- geo centroids.shp - a map of Boston's geocodes

- Student Assignment Report (Excel).xls - contains assignment data for Boston Public Schools Pre Schools
  (BPS PreK). This data was obtained through request from BPS. This data contains information on each
  student's school preference, walk-zone priority, assigned school, and assigned program. This was used to
  replicate the Boston PreK matching mechanism between 1997 and 2003.

- EconomistStudentExtract.txt - This dataset tracks the enrollment status of students in BPS from 1997-2003 and
  includes attendance records, years of attendance, student names and BPS student numbers. This data set was
  obtained under a data use agreement with Boston Public Schools.

- BPSEnrollmentAugust2012.CSV - This dataset tracks the enrollment status of students in BPS from 2004-2011 and
  includes attendance records, years of attendance, student names and BPS student numbers. This data set was
  obtained under a data use agreement with Boston Public Schools.

* Files from MA DESE:

- sat_with_sasids_by_crosswalk_2007-2016.dta: Contains the most recent SAT data for students in Massachusetts

- mcas[yyyy].dta - This data contains MCAS scores for students in Massachusetts. This file contains every
  MCAS test a student has taken.

- simsoct[yy] and simseoy[yy] - This data contains enrollment data for students in Massachusetts. This includes
  school of attendance, absences, suspensions, transfer status, student name, date of birth, etc.

* Files from the National Student Clearinghouse (NSC):

- 500708_T205676.202006291304_DA.csv - This data set contains information such as the college of enrollment,
  type of college, date of enrollment, graduation status and major based information. This data set is the
  result of a search of every name found in the Boston enrollment data. The search request was submitted in 2019.
  Requests can be made to the NSC at https://www.studentclearinghouse.org/colleges/studenttracker/

* Files from publicly available sources:

- tl_2010_25_bg00.shp - Tiger/Line shapefiles of the 2000 Census Block Groups obtained from
  https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.2000.html

- R13124478.txt, which is converted to "census_2000.dta" - 2000 Census Block Group data obtained from Social
  Explorer

- tbl[yy]Programs.csv - information about Head Start programs in year 19yy or 20yy. Data available to the
  public upon request, contact help@hsesinfo.org.
  See URL: https://eclkc.ohs.acf.hhs.gov/data-ongoing-monitoring/article/program-information-report-pir

- tbl[yy]Enrollment.csv - information about Head Start enrollment in year 19yy or 20yy. Data available to the
  public upon request, contact help@hsesinfo.org.
  See URL: https://eclkc.ohs.acf.hhs.gov/data-ongoing-monitoring/article/program-information-report-pir

- ELSI_excel_export_private_all_years - contains enrollment counts for private providers of PreK programs
  in Massachusetts. Downloaded from https://nces.ed.gov/ccd/elsi/tableGenerator.aspx

************************************ 2. Replication Instructions *************************************

To run the code from beginning to end, including cleaning data files, setting up the analysis files, and
producing the tables and graphs, follow the instructions below

1. If running the code for the first time and create all relevant data folders by opening code/makerfile.do,
   replacing paths with your machine's, and running this file. Note that all STATA packages are already
   pre-installed in code/ado/plus, in the version that was used to run the code for this paper. The lines of code to
   install these packages are therefore commented out in code/makerfile.do.
2. Before starting the main data cleaning, follow step 1 in code/0_Data_Prep/QGIS/README.txt to create the
   intermediate file with BPS geocodes to Census Block Group mapping using QGIS, an application you will
   need to download on your machine.
3. Go to the do-files in the following subfolders:
   - 0_Data_Prep/
   - 1_Match_RDMD/
   - 2_Analysis_Prep/
   - 3_Analysis/
   - 3a_Figures/
   and switch on any code switches at the top of the .do files to run all sections of the code within each do-file.
4. Go to set_paths.ado and replace paths with your machine's.
5. Go to Master.do and turn on all switches and run it (it will call set_paths.ado).
6. You will end up with various output in the /results folder. Instructions on how to have
   estimates flow through to the formatted Excel deck are:
   a. Most of the table estimates will be stored in "/results/tables/final_deck_results.xlsx."
      You can paste this into the tab called "raw" in the formatted deck and the formatted tables
      will update with the estimates.
   b. Some of the results are outputted separately. These should be copied into their
      corresponding tabs listed below:
      - /raw_tabs/fig_1_counts.xlsx: Copy into "F1_enroll_over_time" tab
      - /raw_tabs/matchrepk0k1.xlsx: Input into "A2_mech_replication" tab
          -> Copy column F in matchrepk0k1.xlsx into column 2 of the table.
             Copy column E of matchrepk0k1.xlsx into hidden column E of the
             "A2_mech_replication" tab.
      - /raw_tabs/Table_A3.xlsx: Copy into "Subst reg coeffs" tab
      - /raw_tabs/table_b1.xlsx: Copy into "B1_att_count" tab
      - /raw_tabs/table_b2_mcas_K1_m_sub.xlsx: Copy into "mcas_math" tab
      - /raw_tabs/table_b2_mcas_K1_e_sub.xlsx: Copy into "mcas_ela" tab
      - /raw_tabs/table_b3.xlsx: Copy into "B3_att_reg"

******************************************* 2. Output ************************************************
The tables and figures in the paper are produced using code from the /code/3_Analysis/ folder and
/code/3a_Figures/, respectively. We list which code files contribute to each table and figure below:

/code/3_Analysis/a_balance.do                     produces estimates for Table 1 and Table 2

/code/3_Analysis/b_nsc.do                         produces estimates for Table 3

/code/3_Analysis/c_sims.do                        produces estimates for Table 4

/code/3_Analysis/d_tests.do                       produces estimates for Tables 5 and 6

/code/3_Analysis/c_sims.do                        produces estimates for Table 7

/code/3_Analysis/e_subgroup.do                    produces p-values for tests of equality between
                                                      subgroups for a given outcome for Table 8
/code/3_Analysis/f_T8_row_joint_pvals.do          produces p-values for joint tests of equality
                                                      across subgroups within an outcome for
                                                      column 8 in Table 8
/code/3_Analysis/g_T8_col_joint_pvals.do          produces p-values for joint tests of equality
                                                      across outcomes within a subgroup for the bottom
                                                      row in Table 8

/code/3_Analysis/c_sims.do                        produces estimates for columns 1-3, 5-6 in Table 9
/code/3_Analysis/d_tests.do                       produces estimates for column 5 in Table 9

/code/3_Analysis/T10_meta.m                       produces average effect and p-value for test of no
                                                      heterogeneity in Table 10 (in Matlab)

/code/3a_Figures/a_fig1.do                        produces the statistics in Figure 1

/code/1_Match_RDMD/a_soda.do                      produces the match replication rates in Appendix Table A2

/code/0_Data_Prep/f_head_start.do                 produces column 3 of Appendix Table A3

/code/3_Analysis/h_prek_substitution.do           produces estimates in Appendix Table A4

/code/3_Analysis/i_bal_post_attrit.do             produces estimates in Tables A5

/code/3_Analysis/b_nsc.do                         produces estimates for Appendix Tables A6 to A10

/code/3_Analysis/c_sims.do                        produces estimates for Appendix Table A11

/code/3a_Figures/b_figA1_psc_hist                 produces Appendix Figure A1

/code/4_Data_Appendix/a_sample_construction.do    produces Data Appendix Table B1

/code/4_Data_Appendix/b_test_attrition.do         produces Data Appendix Table B2

/code/4_Data_Appendix/c_followup_rates.do         produces Data Appendix Table B3
